A chromosome-level genome assembly provides insights into the environmental adaptability and outbreaks of Chlorops oryzae

Chlorops oryzae is a pest of rice that has caused severe damage to crops in major rice-growing areas in recent years. We generated a 447.60 Mb high-quality chromosome-level genome with contig and scaffold N50 values of 1.17 Mb and 117.57 Mb, respectively. Hi-C analysis anchored 93.22% scaffolds to 4 chromosomes. The relatively high expression level of Heat Shock Proteins (HSPs) and antioxidant genes in response to thermal stress suggests these genes may play a role in the environmental adaptability of C. oryzae. The identification of multiple pathways that regulate reproductive development (juvenile hormone, 20-hydroxyecdsone, and insulin signaling pathways) provides evidence that these pathways also play an important role in vitellogenesis and thus insect population maintenance. These findings identify possible reasons for the increased frequency of outbreaks of C. oryzae in recent years. Our chromosome-level genome assembly may provide a basis for further genetic studies of C. oryzae, and promote the development of novel, sustainable strategies to control this pest.

C hlorops oryzae (Diptera, Chloropidae) is an important pest of rice. Newly hatched larvae burrow into the stems of rice plants, then move to the growing tips where they feed on developing leaves and young panicles 1 (Fig. 1). Over the last century, this species has become widespread throughout Japan and Korea causing severe damage to rice crops 2 . In recent years, it has spread to mountainous and semi-mountainous regions and has caused severe crop damage in the country's main rice-growing regions, becoming China's most destructive rice pest. Despite the increasing economic impact of this pest, there is limited data on the major ecological characteristics of C. oryzae, such as environmental adaptability and frequency of outbreaks, due to a lack of genomic resources for this organism.
There are~400 insect species with genomes available at NCBI (https://www.ncbi.nlm.nih.gov/genome/browse#!/overview/ insects), and genomic tools, particularly the availability of a highquality assembled genomes, could help explain the molecular mechanism for C. oryzae invasion adaptability and outbreak frequency 3,4 . Several genome sequences have been reported in Diptera taxa, including Drosophila melanogaster 5 , Anopheles gambiae 6 , Musca domestica 7 , Ceratitis capitata 8 and Bactrocera dorsalis 4 , which have helped determine the molecular and genetic mechanisms of many biological problems. Unfortunately, no genome, particularly a chromosome-level genome, is available for C. oryzae.
Changing environmental conditions such as global warming are known to influence the frequency of insect outbreaks. The growth, development, and reproduction of insects can all be directly affected by temperature. For example, in multivoltine taxa such as the Aphididae and some Lepidoptera (e.g Pieris brassicae), higher temperature decreases development time, potentially increasing the number of generations produced per year 9 . Temperature also affects insect distribution and abundance 10 . According to the Intergovernmental Panel on Climate Change (IPCC), the planet is warming by around 0.6°C per annum, with global temperatures anticipated to rise by an average of 1.4-5.8°C by 2100 11 . Over the past 55 years, the average number of high-temperature days recorded in China has increased by 28.4% 12 . Consequently, this raises the question of how C. oryzae adapts to a warming environment. Furthermore, the emergence of insecticide resistance may be an additional factor influencing pest outbreaks. C. oryzae has become widespread and is increasing rapidly, making crop protection more challenging 13 . Insecticides are currently the primary means of controlling C. oryzae in the field. Growing evidence suggests that resistance to chemical insecticides is caused by decreased sensitivity of target-site proteins and increased metabolic detoxification of insecticides 14 . Metabolic resistance arises from the overexpression of detoxification enzyme genes, which belong to three major metabolic detoxification gene families: carboxylesterase, glutathione-S-transferases and cytochrome P450 13 . P450s are one of the largest gene families in all organisms, performing highly diverse physiological and biochemical functions essential for the detoxification and/or activation of heterologous and endogenous compounds 15 . Understanding the genomic characteristics that underpin high temperature adaptation and insecticide resistance is essential for developing effective prevention and control measures for this pest.
To determine the internal mechanisms underlying the adaptability and frequency of outbreaks of C. oryzae in recent years, we generated a high-quality chromosome-level genome assembly of C. oryzae through the combined application of PacBio and Illumina sequencing, the HiSeq X Ten platform, and Hi-C. We then conducted a comparative analysis using available insect genomes to gain a better understanding of the genomic evolution of C. oryzae. This reference genome has facilitated the identification of xenobiotic detoxification enzymes such as cytochrome P450s. Additionally, by combining transcriptome and qPCR technologies, we further investigated the molecular basis underlying the ability of C. oryzae to adapt to novel and changing environmental conditions. Finally, we discuss the results of functional studies on the reproductive development of C. oryzae. Our findings may provide a genetic basis for future research on C. oryzae outbreaks and promote the development of effective strategies to control this pest.

Results
Assembly and annotation of the C. oryzae genome. C. oryzae has a diploid chromosome number (2n) of 8. To analyze this genetic resource, we assembled the C. oryzae genome using PacBio long reads and Hi-C chromatin contact information.
Removing low-quality and short reads, left a total of 55.92 Gb clean reads for genome assembly, which k-mer analysis estimated to have a genome size of 462.76 Mb (Supplementary Fig. 1 and Supplementary Table 1). A total length of 448.90 Mb assembled genome obtained by using the third-generation sequencing, after correction with Illumina reads, we obtained 3407 contigs with a total length of 447.60 Mb and an N50 of 1.17 Mb. The assembly was then significantly improved, yielding 1575 scaffolds with a total length of 447.78 Mb and an N50 of 117.57 Mb. Hi-C chromatin contact information further supported 4 scaffolds being anchored, ordered, and oriented to give four chromosomes with >93% of assembled bases located on these (Fig. 2, Table 1 and  Supplementary Table 2). The completeness of the genome, evaluated by calculating the genome coverage rate for a set of singlecopy orthologous eukaryotes genes with BUSCO (v3.0.1), was estimated at 96.1% (Supplementary Table 3). Further analysis of GC content and sequencing coverage showed a normal distribution among assembled scaffolds ( Supplementary Fig. 2), which is indicative of low contamination of the assembly. More than 97% of consensus transcripts mapped to the assembly (Supplementary Table 4). These results suggest high accuracy and completeness of the genome assembly.
In total, 256.8 Mb of repeat sequences were identified, comprising 56.30% of the C. oryzae genome (Supplementary Table 5). DNA transposons and retroelements accounted for 7.25% and 14.09%, respectively. 5.03% were classified as long interspersed elements (LINEs), 0.01% as short interspersed elements (SINEs) and 9.05% as long terminal repeats (LTRs) of the genome. The protein-coding genes in the reference genome were predicted by EVidenceModeler (EVM) (http:// evidencemodeler.github.io/), a total of 17,259 gene models were predicted in the assembled genome as the reference gene set   Table 8).
Gene orthology and phylogenetic analysis. OrthoMCL (http:// OrthoMCL.org/OrthoMCL/) was used to identify orthologous genes in C. oryzae and 14 other insect species from six orders (Isoptera, Hemiptera, Hymenoptera, Coleoptera, Diptera and Lepidoptera). A total of 2298 single-copy orthologous genes and 1602 multiple-copy orthologous genes were identified (Supplementary Table 9). Protein sequences of the single-copy genes were used to infer phylogenetic relationships and estimate the divergence between species. The result indicates that C. oryzae diverged from C. capitata (both of which are members of the Cyclorrhapha) around 186 million years ago (Fig. 3).
Gene family expansion and contraction. We used CAFÉ software to study the gene family expansion and contraction of C. oryzae and related species during evolution. The results showed that compared with the common ancestor of C. oryzae and C. capitata, the C. oryzae genome displayed 561 expanded and 416 contracted gene families ( Supplementary Fig. 3).
The C. oryzae cytochrome P450 gene. The enhanced metabolization of insecticides by cytochrome P450 monooxygenases is a common insecticide resistance mechanism 16 . We identified 69 cytochrome P450 genes that mapped to the C. oryzae chromosomes (Fig. 4). Phylogenetic analysis of P450s clearly represents four major clans, i.e., the CYP2, the CYP3, the CYP4, and the mitochondrial (Mito) clade (Fig. 4). The distribution of P450 genes across the genome revealed 5 gene clusters with three or more P450 genes ( Supplementary Fig. 4).
Thermal stress response. Thermal stress can alter the permeability of insects' cell membranes, decrease their water content and affect their enzyme and protein activity. Heat Shock Proteins (HSPs) and antioxidant genes play important roles in protecting insects against these adverse consequences of thermal stress 17,18 . We manually annotated the HSP and antioxidant gene families in To investigate the role of stress response and antioxidant genes in ameliorating the adverse effects of heat stress in C. oryzae, we conducted a comparative transcriptomic analysis to identify genes that were differentially expressed in larvae that had been exposed to either normal, or high, temperatures. All 41,064 assembled unigenes were submitted to BLASTX for annotation in the NR, NT, SwissProt, KEGG, KOG, Pfam and GO databases. Pairwise comparisons among different temperature treatment groups showed that 1519 transcripts were upregulated and 1823 downregulated, between 24°C and 33°C (24°C as control group), that 1487 transcripts were upregulated and 6996 downregulated, between 24°C and 39°C (24°C as control group), and that 1641 transcripts were upregulated and 6232 downregulated, between 33°C and 39°C (33°C as control group) (Supplementary Data 2). All differentially expressed annotated genes were classified into three categories, "Biological process", "Cellular component" and "Molecular function", by Gene ontology (GO) analysis. "Cellular process" and "Metabolic process" were the most common subcategories in the "Biological process" category, whereas "Binding" and "Catalytic activity" were the most common subcategories in the "Molecular function" category (Supplementary Fig. 16-19).
We identified 62 candidate HSPs, 2 candidate CAT, 27 candidate GST, 10 candidate POD and 9 candidate SOD, genes in the different temperature treatment transcriptomes (Supplementary Fig. 20-24). We used qRT-PCR to validate RNA sequencing (RNA-seq) data by measuring the expression of stress response, and antioxidant, genes. The expression of antioxidant genes such as SOD, GST and POD was significantly affected by temperature ( Fig. 5 and Supplementary Fig. 25), suggesting that these genes are involved in the response of C. oryzae to high temperature stress. Consistent with the RNA-seq results, some HSP genes, such as HSP83, HSP70, HSP68, HSP67B2, HSP27 and HSP23, were also upregulated ( Fig. 5 and Supplementary Fig. 25).
Reproductive development. It is well known that the reproductive capacity of insects is critical to insect population outbreaks. Ovarian maturity is fundamental for female insect reproduction, which can be regulated by juvenile hormone (JH), 20-ecdysterone (20E) or insulin through regulating vitellogenesis 19 . We performed RNAi experiments to disrupt JH, 20E and insulin-like, signaling in newly emerged adult females, focusing on how ovarian development is regulated by different upstream signals. RNAi knockdown of vitellogenin (Vg) completely prevented ovary maturation ( Fig. 6 and Supplementary Fig. 26). Furthermore, RNAi knockdown of key genes in the JH pathway (Methoprenetolerant (Met) and Krüppel homolog 1 (Kr-h1)) reduced yolk deposition, thereby inhibiting ovarian development, but RNAi knockdown of Taiman (Tai, a binding partner of Met) had no effect on ovarian development ( Fig. 6 and Supplementary  Fig. 26). RNAi knockdown of key genes in the insulin pathway (InR (insulin receptor), FOXO (Forkhead box-containing protein)), TOR (Target of Rapamycin), and PI3K (phosphatidylinositol 3-kinase) and USP (ultraspiracle protein) (a key gene in the 20E pathway) also affected oocyte maturation and prevented ovarian development ( Fig. 6 and Supplementary Fig. 26). These results indicate that JH, insulin, and 20E are crucial for normal ovary maturation in C. oryzae.

Discussion
We generated a high-quality C. oryzae genome assembly by combing the PacBio Sequel system and HiSeq X Ten platform with Hi-C technology. Long-read sequencing and Hi-C assisted assembly strategies have previously produced high-quality genome assemblies of other animals 20,21 and plants 22,23 . Our results, including contig N50, Scaffold N50, GC content, BUSCO evaluation and the full-length transcripts, indicate that the reference  Cytochrome P450 is an ancient and large superfamily involved in the metabolism of exogenous and endogenous compounds 24 . This gene family has been well studied in insects due to its contribution to adaptation to exogenous compounds and pesticide resistance 21,25 . In some orders, the number of these detoxification family genes is related to the level of insecticide resistance. For example, compared to other blattodea species, the expanded cytochrome P450 monooxygenase gene family of American cockroaches Periplaneta americana is associated with higher insecticide resistance and survival under extreme conditions 26 . We predicted fewer P450 genes in C. oryzae than in other Diptera. However, since other studies have found no obvious link between the size of detoxification gene families and resistance, C. oryzae may not have any less detoxification capacity than other insects and size could be corelated with the breadth of the host range 27 . Further studies are required to identify key genes in the C. oryzae cytochrome P450 family.
Heat shock proteins are ubiquitous and evolutionarily conserved families of proteins in all living organisms that are critical for environmental adaptation 28 . HSPs usually act as molecular chaperones, facilitating the correct refolding of proteins and preventing the aggregation of denatured proteins, but they also participate in diverse cellular processes such as signal transduction, DNA replication, metabolic detoxification and immune defense reactions 29 . Furthermore, HSPs can be induced by extreme temperatures, oxidation, UV and heavy metals to help organisms withstand adverse environmental conditions 28 . Since their discovery in D. melanogaster larvae, HSPs have been found to be involved in the heat stress responses of many insects 30,31 . Our results show that the expression of several HSPs (HSP83, HSP70, HSP68, HSP67B2, HSP27 and HSP23) was significantly upregulated in the high temperature treatment groups relative to the control, which suggests that these genes play an important role in counteracting heat stress in C. oryzae.
HSP83 is a member of HSP90 family. HSP90 proteins often regulate the ability of animals to adapt to adverse environmental conditions and serve as the primary self-protection mechanism. Consistent with our results, HSP83 was upregulated in Sesamia nonagrioides after exposure to an elevated temperature (40°C) 32 . The others, HSP70, HSP68 and HSP67B2 belong to the most conserved HSP gene family-the HSP70 family 31,33 . Several studies have found that HSP70s play important roles in resistance to heat stress. In Drosophila, thermotolerance was significantly improved by the inducible expression of HSP70s 33 . Furthermore, the survival rate of female B. tabaci subject to heat stress dramatically decreased after HSP70 knockdown 34 and in Diaphorina citri HSP70 was significantly upregulated in insects subject to heat stress 18 . Besides that, HSP27 and HSP23 are members of the sHSP family, which has the most diverse functions among stressresponse proteins. Consistent with the results of a study on Chironomus riparius 35 , we found that expression of HSP27 and HSP23 markedly increased at high temperatures. Apart from responding to heat and oxidative stress, sHSPs may also be involved in diapause 36 , embryo formation, physiological regulation 37 and metamorphosis 38 . Overall, our results indicate that HSP83, HSP70, HSP68, HSP67B2, HSP27 and HSP23 play important roles in the ability of C. oryzae to tolerate thermal stress. Further research is, however, required to determine the specific roles of these genes in C. oryzae.
Heat stress causes a variety of physiological stress responses in insects, including increased production of reactive oxygen species (ROS) that can cause oxidative damage. Oxidative damage in proteins ranges from specific amino acid modifications and peptide breakage to the loss of enzyme activity 39 . To prevent such damage, organisms have developed antioxidant defense mechanisms, such as specific antioxidant systems (e.g., vitamins, glutathione, antioxidant enzymes, etc.) 40 . Our results show that exposure to high temperatures significantly affected the expression of SOD, GST and POD, which suggests that these antioxidant enzymes are involved in antioxidant responses to thermal stress in C. oryzae. SOD is known to play an important role in reducing the level of superoxide radicals induced by low, or high, ambient temperatures 41 . In Monochamus alternatus, the SOD gene was upregulated in larvae exposed to 40°C 17 . Similarly, we found that SOD expression was significantly higher in the 39°C treatment group than in the 24°C control group, suggesting that SOD was induced by exposure to high temperatures. SOD scavenges superoxide anions thereby protecting insects from thermal stress. GST is thought to participate in the inactivation of accumulated toxic, lipid, peroxidation products caused by oxidative damage and xenobiotics treatment 42 . We found that expression of GST was significantly higher in the 36°C treatment group than in the control, and similar results have been reported in Ostrinia furnacalis 43 and Panonychus citri 44 . Although GST was not significantly upregulated in the 39°C treatment group compared to the control, this could be because lipid peroxidation was mitigated by other antioxidant mechanisms. In addition to SOD and GST, insects also have POD, which breaks down H 2 O 2 45 . We found that POD expression was significantly lower in the temperature treatment groups relative to the control. Conversely, exposure to a high temperature (35°C) for 1 h dramatically increased POD activity in Aphidius gifuensis 46 , whereas there was no significant difference in POD expression between high temperature treatment groups and the control (25°C) in P. citri, even after the duration of exposure to high temperatures was increased 44 . Further research is required to understand the role of POD in the responses of insects to thermal stress.
Ovarian development, the most important part of the reproductive system of female insects 19 , is essential for maintaining insect populations. Ovarian maturity can be regulated by JH, 20E or insulin through regulating vitellogenesis 19 . Our RNAi experiments on newly emerged females demonstrate that JH, insulin and 20E are critical to the regulation of oocyte maturation and ovarian development in C. oryzae. Apparently, vitellogenesis and egg maturation are coordinated by three hormonal signals in this pest, which is not unexpected given the complex reproductive processes in dipterans 19 . And it also appears to be the case in P. americana 26 . These results highlight the importance of understanding the molecular mechanisms underlying hormonal signaling pathways during ovarian maturation. JH, which acts via Met, controls vitellogenesis and oocyte maturation. Knockdown of Met consequently inhibits JH-induced Vg expression, ovarian development and lipid accumulation 47 . As an early response gene in the JH signaling pathway, Kr-h1 has been confirmed to play an important role in yolk formation and ovarian development in Bactrocera dorsalis, Locusta migratoria and Helicoverpa armigera [47][48][49] . Consistent with these findings, our results show that Met and Kr-h1 RNAi depletion block ovarian maturation in C. oryzae. Similarly, knockdown of the endoplasmic reticulum glucose-regulated chaperone Grp78 gene, which is also regulated by JH, significantly inhibited follicular cell development and reproduction in L. migratoria 50 . These findings show that the JH signaling pathway regulates insect reproduction via multiple factors.
We found that knockdown of key genes in the insulin pathway (InR, FOXO, TOR, and PI3K) decreased yolk deposition and blocked ovarian development. The insulin signaling pathway is dependent on adequate nutrition, only female mosquitoes that have obtained a blood-meal can complete normal ovarian development. After a blood meal, amino acids activate TOR signaling, phosphorylate transcription activator S6K and transcription inhibitor 4E-BP, and RNAi Rheb, S6Kmediated gene, blocks Vg expression and egg maturation 51 . TOR signaling often regulates insect reproduction in combination with insulin signaling. Insulin binds to the insulin receptor InR, inducing phosphorylation of InR and interacting with the substrate to activate the PI3K pathway after which the normal transcription of Vg is regulated by cascade phosphorylation (PDK), protease B (Akt/PKB) and FOXO 52 . RNAimediated silencing of InR has been found to have negative effects on insect reproduction in several species, confirming the role of the insulin pathway in controlling reproductive processes 53,54 . In addition to the direct effect of the insulin pathway on vitellogenesis, an interaction between this and other pathways may also regulate ovarian development in C. oryzae. In some insects, insulin/TOR signaling regulates ovarian maturation by affecting JH signaling or JH biosynthesis 55,56 . In adult Drosophila, InR mutations led to a decrease in JH titer 57 , whereas silencing TOR and starvation caused a significant decrease in the levels of JH synthase and JH synthesis mRNA in the corpora allata of adult female cockroaches 58 . The TOR nutritional signaling pathway was found to have a similar effect on JH biosynthesis in A. aegypti and N. lugens 59,60 . In Blattella germanica, knockdown of InR was found to inhibit JH biosynthesis and decrease Vg expression, thereby blocking ovarian development 53 . The regulatory networks controlling ovarian maturation in insects are clearly complicated, and additional research is therefore required to understand the molecular mechanisms controlling this process in C. oryzae.
C. oryzae outbreaks frequently in recent years. Adaptation to warmer temperatures may have increased the frequency of outbreaks. In addition, pesticide resistance also facilitates C. oryzae outbreaks by making the species more difficult to control. Reproductive development is also crucial for maintaining insect populations. In summary, this paper provides insights on possible reasons for C. oryzae frequent outbreaks in recent years. Our chromosome-level genome assembly should both facilitate future genetic research on the causes of C. oryzae outbreaks and support the development of sustainable control strategies for this pest.

Methods
Insects. C. oryzae larvae were collected in 2019 in Hanshou County, Hunan province, China, and reared on fresh rice stems in the laboratory. Larvae were kept at 24 ± 1°C and >80% relative humidity, under a photoperiod of 16:8 (L:D) h. Larvae were used for Illumina sequencing for transcriptome analysis, and 100 female adults were collected for Illumina, PacBio, and Hi-C sequencing for genome analysis.
Genome sequencing. The whole genome was sequenced on the PacBio Sequel System (https://www.pacb.com/products-and-services/pacbio-systems/sequel/) based on single-molecule real-time (SMRT) sequencing technology. The template library was constructed using a SMRTbell Template Prep Kit 1.0 and a SMRTbell Damage Repair Kit. Following the procedure described in the PacBio brochure ">20 kb Template Preparation Using BluePippin ™ Size-Selection System (15-20 kb Cutoff) for Sequel ™ Systems", the quality DNA was fragmented with g-TUBE (covaries, 520079), concentrated with AMPure ® PB magnetic beads and the fragments eluted with Pacific Biosciences ® Elution Buffer. The fragments were damagerepaired with ExoVII, end-repaired with End Repair Mix and ligated with the blunt adapter. After removing failed ligation products with ExoIII and ExoVII, ligation products were purified twice with AMPure ® PB Beads, and selected for size with the BluePippin ™ Size-Selection System. The fragments obtained were bead-purified, damage-repaired, and used as~20 kb SMRTbell templates. These templates were annealed with primers and bound to DNA polymerase using a PacBio DNA/ Polymerase Kit and magnetic beads, and loaded into the PacBio Sequel™ System for sequencing.
Generation of short reads for genome correction. In order to collect Illumina paired-end reads, we used agarose electrophoresis (1% agarose gels) to check for possible degradation and contamination of genomic DNA, determined its purity with a NanoPhotometer® (IMPLEN, CA, USA) and measured its concentration with a Qubit ® 2.0 Fluorometer (Life Technologies, CA, USA). Only genomic DNA that passed these quality controls was included in the short fragment library constructed following the TruSeq DNA Sample Preparation Guide (Illumina, 15026486 Rev. C). This procedure mainly included the steps of DNA fragmentation, end-repairing, base "A" tailing, ligation of adapters, the recovery of DNA of the required size from gels and PCR amplification of the recovered DNA. Amplification products were used as libraries for sequencing once they passed quality checks. In brief, amplification products were quantified with Qubit2.0 and their size range determined with Agilent 2100. If fragments were within the expected size range, the library was accurately quantified with a Bio-RAD CFX 96 real time quantitative PCR thermocycler and a Bio-RAD KIT iQ SYBR GRN Q-PCR thermocycler. The quality library was sequenced on a HiSeq X Ten Platform set to the PE150 program and paired-end reads obtained.
K-mer analysis. Before genome assembly, genome features can be estimated from the sequences obtained by sequencing. We used the analysis method based on K-mer to estimate the genome size. We iteratively selected the sequence with the length of K bases from a continuous sequence. If the length of each sequence is L, the length of K-mer is K, then L-K + 1 K-mer can be obtained. Here, we took K = 21 for analysis. The distribution of K-mers depends on the characteristics of the genome and follow a Poisson's distribution.
Genome assembly. The quality of the reads exported by Sequel™ Systems was evaluated with the in-built High Quality Region Finder (HQRF), which identifies the longest high quality region generated for each read by a singly-loaded DNA polymerase according to the signal to noise ratio. Upon generation by the system, all bases were marked with "!" in order to perfect the format. High-quality reads (or regions) were marked with "0.8" and low-quality ones with "0".
High-quality reads were assembled into contigs using Canu (v1.5 https://github. com/marbl/canu) by setting the parameters as follows: canu -pacbio-raw sample.subreads.fasta -p sample -d sample-canu genomeSize = 40 m gridEngineMemoryOption = "-l vf = MEMORY". Canu used all-versus-all overlap information to correct individual reads. It selected these overlaps in a two-step COMMUNICATIONS BIOLOGY | https://doi.org/10.1038/s42003-022-03850-7 ARTICLE COMMUNICATIONS BIOLOGY | (2022) 5:881 | https://doi.org/10.1038/s42003-022-03850-7 | www.nature.com/commsbio filtration process comprised of global and local filtration. Global filtration identified targets where a read may provide correction support, whereas local filtration allowed a read to accept, or reject, the correction evidence provided by other reads. At the trimming stage, Canu identified the region of each read without correction support, trimming or splicing reads into their longest regions with correction support. These regions were subject to a final check for sequencing errors, and then used to construct the best overlap graph based on the output contigs and to compile summary statistics.
The consensus genome was subject to a final round of base-error correction (polishing) by referring to the Illumina reads with BWA (v0.7.9a) and Pilon (v1.22, https://github.com/broadinstitute/pilon). The Illumina paired-end reads were mapped to the contigs by BWA (parameter, -k 30), after which Pilon (v1.22) (default parameters) used this alignment to correct the assembly. The quality of the genome sequence obtained was further evaluated with BUSCO (v3.0.1, http:// busco.ezlab.org/) (default parameters) based on a set of single-copy orthologous eukaryotes genes.
Hi-C. We performed Hi-C sequencing to facilitate assembly of the C. oryzae genome. After crosslinking, samples were used for quality control. The Hi-C library was then prepared and sequenced on the Illumina Novaseq platform with 2 × 150 bp reads at Annoroad Gene Technology Co. Ltd. (Supplementary Table 10; Supplementary Fig. 27). We first used the bowtie 2 end-to-end algorithm to align cleaned reads with the reference genome 61 . Unmapped reads mainly consisted of chimeric fragments that spanned the ligation junction. According to the Hi-C introduction and fill-in strategy, HiC-Pro (v2.7.8) was used to detect the ligation site with an exact matching program and to align the 5' segment read on the genome 62 . The results of each mapping step were then merged into a single alignment file. Lachesis, the assembly package, was used to cluster, order and orient reads. Finally, we cut the chromosomes predicted by Lachesis into equal-length bins, such as 1 Mb or 500Kb, and constructed a heat map according to the interaction signals revealed by the effective mapping between bins ( Supplementary  Fig. 27).
Genome annotation. Two methods, homologous sequence prediction and ab initio prediction, were used to predict repetitive sequences. Homologous sequence prediction is based on RepBase (https://www.girinst.org/server/RepBase/index.php), a repeat sequence database. RepeatMasker and RepeatProteinMask were used to predict sequences similar to known repeat sequences 63 . RepeatModeler (http:// www.repeatmasker.org/RepeatModeler/) was used in ab initio prediction. First, a de novo repeat sequence library was established by RepeatModeler, and then repeat sequences were predicted by RepeatMasker. In addition, the ab initio prediction method was also used to find tandem repeat sequences in the genome with TRF software 64 .
Gene structure prediction was performed using three strategies: evidential support of transcriptional data, homologous prediction, and ab initio prediction. For the evidential support of transcriptional data, we used EST/CDA sequence and genome alignment to predict gene structure, with the commonly used software PASA (http://pasa.sourceforge.net/) 65 . In homologous prediction, the coding protein sequences of known homologous species (Drosophila melanogaster, Ceratitis capitata, and Lucilia cuprina) were compared with genome sequences of C. oryzae and the gene structure was predicted by BLAST (http://blast.ncbi.nlm. nih.gov/Blast.cgi) 66 , Genewise (http://www.ebi.ac.uk/~birney/wise2/) 67 . Software based on the statistical characteristics of genomic sequence data (such as codon frequency, exon-intron distribution) was used to predict gene structure in ab initio prediction. The most commonly used software packages are Augustus (http:// augustus.gobics.de/), SNAP (https://github.com/KorfLab/SNAP) and GeneMark (http://exon.gatech.edu/GeneMark/). Finally, to synthesize the above forecast results, the gene sets predicted by each strategy were integrated into a nonredundant and more complete gene set with EVidenceModeler (EVM) (http:// evidencemodeler.github.io/) 68 .
Gene family expansion and contraction analysis. According to the cluster analysis results of gene families, filtering them to remove gene families whose number of genes is >200 in one species and <2 in other species, and gene families whose total number of genes in the gene family is less than the number of species family. Then, using the CAFÉ software (http://sourceforge.net/projects/ cafehahnlab) with PGM (probabilistic graphical models) model to simulate the acquisition and loss of genes under the specified evolutionary tree, and analyzing the expansion and contraction of gene family through hypothesis test.
Gene family identification and analysis. We first downloaded a set of reference protein sequences from NCBI GenBank and used Hidden Markov models (HMMs) to obtain references for gene identification. SPDE (v1.2) 73 (E-value ≤ 1e-5) was used to search for candidate genes in the C. oryzae genome. Then, genes were manually annotated by BLASTP and GENEWISE, and the number of genes is consistent with that the number we identified. 62 genes are complete and 7 genes needed to be fixed. After which a neighbor-joining tree was constructed using MEGA7 74 with the Poisson correction method and 1000 bootstrap replicate searches. The final phylogenetic tree was prepared in iTOL (v5) (http://itol.embl. de) and Adobe Illustrator (Adobe Systems, San Jose, CA, USA). The phylogenetic tree of the target gene family was constructed using genes from C. oryzae, D. melanogaster, L. cuprina, and C. capilata (Supplementary Data 3).
Location of P450 genes, HSP genes and antioxidant genes on the chromosome. To locate all identified genes on the chromosome, we first used SPDE (v1.2) 73 (E-value ≤ 1e-5) to search for candidate genes in the C. oryzae genome and mapped those found onto the chromosome using a GFF3 file and TBtools (v1.071) 75 .
Transcriptome sequencing and analysis during larval temperature stress experiments. C. oryzae causes economically damage to rice crops and can complete second and third generations under high temperatures, which can result in outbreaks. To understand the responses of C. oryzae larvae to temperature stress, larvae were randomly assigned to one of three temperature treatment groups: 33°C, 36°C and 39°C. Each group was comprised of 20 larvae and had three biological replicates. Larvae in each group were subjected to one of the above temperature treatments for 2 h whereas the control group was kept at 24°C. At the end of the experiment larvae were frozen in liquid nitrogen for 5 min, then stored at −80°C until required. 1.5 μg of RNA from each sample was used to construct a cDNA (Complementary DNA) library. An RNA-seq library was sequenced on an Illumina Hiseq platform. SOAPnuke, a self-developed filtering software, was used to compile statistics, and raw reads were cleaned by filtering them against reads containing adapters, poly-N and low-quality reads with trimmomatic. De novo assembly of clean reads (the removal of PCR duplicates to improve assembly efficiency) was conducted using Trinity, after which the assembled transcripts were clustered and de-duplicated using Tgicl to obtain Unigenes. Clean reads were aligned to the genomic sequence with Bowtie 2 61 , after which the gene expression level of each sample was calculated using RSEM 76 . The DEGseq method is based on the Poisson distribution. DEGs were detected using the method described in Wang, Feng, Wang, Wang, and Zhang (2010) 77 . P-values were adjusted using Benjamini and Storey's approach to control false positives. Genes with a ≥ 2-fold difference in expression, and an adjusted P-value ≤ 0.001, were considered significantly differentially expressed.
RNA interference. RNAi was used to determine the function of target genes in oocyte maturation using the EGFP (enhanced green fluorescent protein, GenBank Accession No. U55762) gene as a parallel control. For double-strand RNA (dsRNA) preparation specific primers (Supplementary Data 4) conjugated with the T7 promoter sequence were first used for PCR amplification, and the resultant PCR products were used as templates for dsRNA synthesis. dsRNA was synthesized using the T7 RiboMAX Express RNAi System (Promega, Madison, WI, United States) according to the manufacturer's instructions. 500 ng dsRNA was injected into each newly emerged adult female in the treatment group and the same dosage of dsEGFP injected into those in the control group. Three biological replicates were performed. The ovarian morphology of females in each group was observed under a stereomicroscope (Motic SMZ-161, Motic Group Co., Xiamen, China) 72 h after dsRNA injection.
Total RNA extraction and quantitative real-time PCR. Total RNA was extracted using TRIzol reagent (Invitrogen, Carlsbad, CA, USA) according to the manufacturer's instructions. RNA purity was verified with gel electrophoresis and its concentration measured with a Qubit ® RNA Assay Kit in Qubit® 2.0 Flurometer (Life Technologies, CA, USA). qPCR primers were designed using the NCBI profile server (http://www.ncbi.nlm.nih.gov/tools/primer-blast)(see Supplementary Data 4 for a list of the primers used). cDNA was synthesized using a PrimeScript RT Reagent Kit with gDNA Eraser (Perfect Real Time) (Takara, Dalian, China) according to the manufacturer's instructions. cDNA templates were diluted 5 times with deionized water. qRT-PCR was performed on a CFX96 Touch TM Real-Time PCR Detection System (Bio-Rad Laboratories, Hercules, CA, USA) in a reaction volume of 20 μl using TB Green TM Premix Ex Taq TM II (Takara), according to the manufacturer's instructions. RPS15 (ribosomal protein S15) and RP49 (ribosomal protein 49) were the internal references genes 78 . A two-step program was performed as follows: 95°C for 30 s, 40 cycles at 95°C for 10 s and 59°C for 30 s. A melting curve analysis was performed from 55°C to 95°C to determine the specificity of the qPCR primers and their efficiency was verified by calculating a standard curve (cDNA concentration vs. Ct) based on the dilution gradient of the templates. The 2 -ΔΔCt method was used to calculate the relative expression levels of target genes 79 .
Statistics and reproducibility. Statistical analysis of the qRT-PCR results was conducted in GraphPad Prism 8 software (GraphPad Software Inc., San Diego, CA, United States). Data are presented as mean ± standard error (SE). The statistical significance of differences in the expression of genes among temperature treatment groups was analyzed with one-way analysis of variance (ANOVA) followed by Tukey's honestly significant difference test for multiple sample comparisons (P < 0.05). The statistical significance of differences in gene expression between the RNAi treatment and control group was evaluated with Student's t-test ( * P < 0.05, * * P < 0.01, * * * P < 0.001). Go term analysis of upregulated pathways was performed using the OmicShare tools, a free online platform for data analysis (https://www. omicshare.com/tools) 80 .
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
All raw data from the Chlorops oryzae genome have been deposited in the SRA under SRR14340331 (BioProject PRJNA728371). This Whole Genome Shotgun project has been deposited at DDBJ/ENA/GenBank under the accession JAIPUU000000000. The C. oryzae transcriptome data for genome annotation are also stored in the SRA (SRR7528441